AITopics | mab strategy

Collaborating Authors

mab strategy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: Bandit Learning with Positive Externalities

Neural Information Processing SystemsOct-8-2024, 08:38:24 GMT

The paper studies the interesting problem of learning with externalities, in a multi-armed bandit (MAB) setting. The main idea is that there might be a bias in the preferences in the users arriving on on-line platforms. Specifically, future user arrivals on the on-line platforms are likely to have similar preferences to users who have previously accessed the same platform and were satisfied with the service. Since some on-line platforms use MAB algorithms for optimizing their service, the authors propose the Balanced Exploration (BE) MAB algorithm, which has a structured exploration strategy that takes into account this potential "future user preference bias" (referred to as "positive externalities"). The bias in the preference of the users is translated directly into reward values specific to users arriving to on-line platform: out of the m possible items/arms, each user has a preference for a subset of them (the reward for this being a Bernoulli reward with mean proportional to the popularity of the arm) and the rewards of all other arms will always be null.

algorithm, artificial intelligence, positive externality, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.59)

Add feedback

Player Modeling via Multi-Armed Bandits

Gray, Robert C., Zhu, Jichen, Arigo, Dannielle, Forman, Evan, Ontañón, Santiago

arXiv.org Artificial IntelligenceFeb-10-2021

This paper focuses on building personalized player models solely from player behavior in the context of adaptive games. We present two main contributions: The first is a novel approach to player modeling based on multi-armed bandits (MABs). This approach addresses, at the same time and in a principled way, both the problem of collecting data to model the characteristics of interest for the current player and the problem of adapting the interactive experience based on this model. Second, we present an approach to evaluating and fine-tuning these algorithms prior to generating data in a user study. This is an important problem, because conducting user studies is an expensive and labor-intensive process; therefore, an ability to evaluate the algorithms beforehand can save a significant amount of resources. We evaluate our approach in the context of modeling players' social comparison orientation (SCO) and present empirical results from both simulations and real players.

mab strategy, motivation, simulated player, (12 more...)

arXiv.org Artificial Intelligence

2102.05264

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.06)
Europe > Middle East > Malta (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.93)
Information Technology > Human Computer Interaction > Interfaces (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.72)

Add feedback

Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits

Gray, Robert C., Zhu, Jichen, Ontañón, Santiago

arXiv.org Artificial IntelligenceFeb-9-2021

This paper explores multi-armed bandit (MAB) strategies in very short horizon scenarios, i.e., when the bandit strategy is only allowed very few interactions with the environment. This is an understudied setting in the MAB literature with many applications in the context of games, such as player modeling. Specifically, we pursue three different ideas. First, we explore the use of regression oracles, which replace the simple average used in strategies such as epsilon-greedy with linear regression models. Second, we examine different exploration patterns such as forced exploration phases. Finally, we introduce a new variant of the UCB1 strategy called UCBT that has interesting properties and no tunable parameters. We present experimental results in a domain motivated by exergames, where the goal is to maximize a player's daily steps. Our results show that the combination of epsilon-greedy or epsilon-decreasing with regression oracles outperforms all other tested strategies in the short horizon setting.

data mining, mab strategy, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2102.05263

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games (0.46)
Health & Medicine > Consumer Health (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes

Liu, Larkin, Downe, Richard, Reid, Joshua

arXiv.org Machine LearningFeb-22-2019

A survey is performed of various Multi-Armed Bandit (MAB) strategies in order to examine their performance in circumstances exhibiting non-stationary stochastic reward functions in conjunction with delayed feedback. We run several MAB simulations to simulate an online eCommerce platform for grocery pick up, optimizing for product availability. In this work, we evaluate several popular MAB strategies, such as $\epsilon$-greedy, UCB1, and Thompson Sampling. We compare the respective performances of each MAB strategy in the context of regret minimization. We run the analysis in the scenario where the reward function is non-stationary. Furthermore, the process experiences delayed feedback, where the reward function is not immediately responsive to the arm played. We devise a new Bayesian technique (BAG1) tailored for non-stationary reward functions in the delayed feedback scenario. The results of the simulation show show superior performance in the context of regret minimization compared to traditional MAB strategies.

data mining, mab strategy, machine learning, (18 more...)

arXiv.org Machine Learning

1902.08593

Country:

North America > United States (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Data Science > Data Mining > Big Data (0.89)

Add feedback